Real-Time Regression Analysis of Streaming Clustered Data With Possible Abnormal Data Batches
نویسندگان
چکیده
This article develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within paradigm of estimation inference, in which parameter estimates are recursively renewed current summary statistics historical data, but no use any subject-level raw compare our both offline generalized estimating equations (GEE) approach that process the entire cumulative all together, show theoretically numerically procedure enjoys statistical computational efficiency. also diagnose homogeneity assumption regression coefficients via sequential goodness-of-fit test screening occurrences abnormal batches. implement proposed methodology by expanding existing Spark’s Lambda architecture for operation quality diagnosis. illustrate extensive simulation studies analysis car crash from National Automotive Sampling System-Crashworthiness Data System (NASS CDS). Supplementary materials this available online.
منابع مشابه
Fuzzy Data Envelopment Analysis for Classification of Streaming Data
The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...
متن کاملFuzzy Data Envelopment Analysis for Classification of Streaming Data
The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...
متن کاملLocal polynomial regression analysis of clustered data
This paper proposes a classical weighted least squares type of local polynomial smoothing for the analysis of clustered data, with the key idea of using generalised inverses of correlation matrices. The estimator has a simple closed-form expression. Simplicity is achieved also for nonparametric generalised linear models with arbitrary link function via a transformation. Our approach can be char...
متن کاملQuantile regression with clustered data Paulo
We show that the quantile regression estimator is consistent and asymptotically normal when the error terms are correlated within clusters but independent across clusters. A consistent estimator of the covariance matrix of the asymptotic distribution is provided and we propose a specification test capable of detecting the presence of intra-cluster correlation. A small simulation study illustrat...
متن کاملReal-Time Streaming Data Delivery over Named Data Networking
Named Data Networking (NDN) is a proposed future Internet architecture that shifts the fundamental abstraction of the network from host-to-host communication to request-response for named, signed data–an information dissemination focused approach. This paper describes a general design for receiver-driven, real-time streaming data (RTSD) applications over the current NDN implementation that aims...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the American Statistical Association
سال: 2022
ISSN: ['0162-1459', '1537-274X', '2326-6228', '1522-5445']
DOI: https://doi.org/10.1080/01621459.2022.2026778